<!doctype html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>决策树特征选择：信息增益与基尼指数</title>
    <link rel="stylesheet" href="https://cdn.staticfile.org/font-awesome/6.4.0/css/all.min.css">
    <link rel="stylesheet" href="https://cdn.staticfile.org/tailwindcss/2.2.19/tailwind.min.css">
    <link href="https://fonts.googleapis.com/css2?family=Noto+Serif+SC:wght@400;500;600;700&family=Noto+Sans+SC:wght@300;400;500;700&display=swap" rel="stylesheet">
    <script src="https://cdn.jsdelivr.net/npm/mermaid@latest/dist/mermaid.min.js"></script>
    <style>
        body {
            font-family: 'Noto Sans SC', Tahoma, Arial, Roboto, "Droid Sans", "Helvetica Neue", "Droid Sans Fallback", "Heiti SC", "Hiragino Sans GB", Simsun, sans-serif;
            background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%);
            min-height: 100vh;
        }
        
        .hero-gradient {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
        }
        
        .card-hover {
            transition: all 0.3s ease;
        }
        
        .card-hover:hover {
            transform: translateY(-5px);
            box-shadow: 0 20px 40px rgba(0,0,0,0.1);
        }
        
        .text-gradient {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            -webkit-background-clip: text;
            -webkit-text-fill-color: transparent;
            background-clip: text;
        }
        
        .code-container {
            background: #1e1e1e;
            border-radius: 12px;
            overflow: hidden;
            box-shadow: 0 10px 30px rgba(0,0,0,0.2);
        }
        
        .code-header {
            background: #2d2d2d;
            padding: 12px 20px;
            display: flex;
            align-items: center;
            gap: 8px;
        }
        
        .code-dot {
            width: 12px;
            height: 12px;
            border-radius: 50%;
        }
        
        pre {
            margin: 0;
            padding: 20px;
            overflow-x: auto;
            color: #d4d4d4;
            font-size: 14px;
            line-height: 1.6;
        }
        
        .formula-box {
            background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);
            padding: 2px;
            border-radius: 8px;
            display: inline-block;
        }
        
        .formula-content {
            background: white;
            padding: 8px 16px;
            border-radius: 6px;
            font-family: 'Courier New', monospace;
            font-weight: 600;
        }
        
        .feature-card {
            background: white;
            border-radius: 16px;
            padding: 24px;
            box-shadow: 0 10px 30px rgba(0,0,0,0.08);
            border: 1px solid rgba(0,0,0,0.05);
        }
        
        .icon-circle {
            width: 60px;
            height: 60px;
            border-radius: 50%;
            display: flex;
            align-items: center;
            justify-content: center;
            font-size: 24px;
            color: white;
        }
        
        .mermaid {
            display: flex;
            justify-content: center;
            margin: 40px 0;
        }
        
        .drop-cap {
            float: left;
            font-size: 4em;
            line-height: 0.8;
            margin: 0.1em 0.1em 0 0;
            font-weight: 700;
            color: #667eea;
        }
    </style>
</head>
<body>
    <!-- Hero Section -->
    <section class="hero-gradient text-white py-20">
        <div class="container mx-auto px-6 text-center">
            <h1 class="text-5xl md:text-6xl font-bold mb-6 animate-pulse">
                决策树特征选择
            </h1>
            <p class="text-xl md:text-2xl mb-8 opacity-90">
                掌握信息增益与基尼指数，构建最优决策树模型
            </p>
            <div class="flex justify-center gap-4 text-lg">
                <span class="bg-white bg-opacity-20 px-4 py-2 rounded-full">
                    <i class="fas fa-brain mr-2"></i>机器学习
                </span>
                <span class="bg-white bg-opacity-20 px-4 py-2 rounded-full">
                    <i class="fas fa-tree mr-2"></i>决策树
                </span>
                <span class="bg-white bg-opacity-20 px-4 py-2 rounded-full">
                    <i class="fas fa-chart-line mr-2"></i>特征工程
                </span>
            </div>
        </div>
    </section>

    <!-- Main Content -->
    <div class="container mx-auto px-6 py-12 max-w-6xl">
        
        <!-- Problem Description -->
        <div class="feature-card mb-8 card-hover">
            <div class="flex items-start gap-4">
                <div class="icon-circle bg-gradient-to-r from-purple-500 to-pink-500">
                    <i class="fas fa-question-circle"></i>
                </div>
                <div class="flex-1">
                    <h2 class="text-2xl font-bold mb-3 text-gray-800">题目描述</h2>
                    <p class="text-gray-600 leading-relaxed">
                        <span class="drop-cap">在</span>构建决策树模型时，如何选择最优的特征进行分裂？这是机器学习中的核心问题之一。本题要求实现信息增益或基尼指数的特征选择方法，通过量化每个特征对分类任务的贡献度，找出最具区分能力的特征作为决策树的分裂节点。
                    </p>
                </div>
            </div>
        </div>

        <!-- Core Concepts -->
        <div class="grid md:grid-cols-2 gap-6 mb-8">
            <div class="feature-card card-hover">
                <div class="flex items-center gap-3 mb-4">
                    <div class="icon-circle bg-gradient-to-r from-blue-500 to-cyan-500">
                        <i class="fas fa-lightbulb"></i>
                    </div>
                    <h3 class="text-xl font-bold text-gray-800">核心算法</h3>
                </div>
                <div class="space-y-3">
                    <div class="flex items-center gap-2">
                        <i class="fas fa-check-circle text-green-500"></i>
                        <span class="text-gray-700">信息增益（Information Gain）</span>
                    </div>
                    <div class="flex items-center gap-2">
                        <i class="fas fa-check-circle text-green-500"></i>
                        <span class="text-gray-700">基尼指数（Gini Index）</span>
                    </div>
                    <div class="flex items-center gap-2">
                        <i class="fas fa-check-circle text-green-500"></i>
                        <span class="text-gray-700">熵（Entropy）计算</span>
                    </div>
                </div>
            </div>
            
            <div class="feature-card card-hover">
                <div class="flex items-center gap-3 mb-4">
                    <div class="icon-circle bg-gradient-to-r from-green-500 to-teal-500">
                        <i class="fas fa-cogs"></i>
                    </div>
                    <h3 class="text-xl font-bold text-gray-800">复杂度分析</h3>
                </div>
                <div class="space-y-3">
                    <div class="flex items-center gap-2">
                        <i class="fas fa-clock text-blue-500"></i>
                        <span class="text-gray-700">时间复杂度：O(n × m)</span>
                    </div>
                    <div class="flex items-center gap-2">
                        <i class="fas fa-memory text-purple-500"></i>
                        <span class="text-gray-700">空间复杂度：O(m)</span>
                    </div>
                    <p class="text-sm text-gray-600 mt-2">其中 n 为样本数，m 为特征数</p>
                </div>
            </div>
        </div>

        <!-- Algorithm Visualization -->
        <div class="feature-card mb-8">
            <h3 class="text-2xl font-bold mb-6 text-center text-gray-800">
                <i class="fas fa-sitemap mr-2 text-purple-500"></i>决策树特征选择流程
            </h3>
            <div class="mermaid">
                graph TD
                    A[开始：原始数据集] --> B[计算目标变量的熵]
                    B --> C{遍历所有特征}
                    C --> D[计算特征的信息增益]
                    D --> E[记录信息增益值]
                    E --> C
                    C --> F[选择信息增益最大的特征]
                    F --> G[使用该特征分裂数据]
                    G --> H[递归构建子树]
                    
                    style A fill:#f9f,stroke:#333,stroke-width:2px
                    style F fill:#9f9,stroke:#333,stroke-width:2px
                    style H fill:#99f,stroke:#333,stroke-width:2px
            </div>
        </div>

        <!-- Key Formulas -->
        <div class="feature-card mb-8">
            <h3 class="text-2xl font-bold mb-6 text-gray-800">
                <i class="fas fa-calculator mr-2 text-pink-500"></i>核心公式
            </h3>
            <div class="grid md:grid-cols-2 gap-6">
                <div class="bg-gray-50 p-6 rounded-lg">
                    <h4 class="font-bold text-lg mb-3 text-gray-700">信息熵（Entropy）</h4>
                    <div class="formula-box mb-4">
                        <div class="formula-content">
                            H = -Σ(p × log₂(p))
                        </div>
                    </div>
                    <p class="text-sm text-gray-600">衡量数据集的不确定性，熵越大表示数据越混乱</p>
                </div>
                <div class="bg-gray-50 p-6 rounded-lg">
                    <h4 class="font-bold text-lg mb-3 text-gray-700">基尼指数（Gini Index）</h4>
                    <div class="formula-box mb-4">
                        <div class="formula-content">
                            Gini = 1 - Σ(p²)
                        </div>
                    </div>
                    <p class="text-sm text-gray-600">衡量数据集的不纯度，值越小表示数据越纯</p>
                </div>
            </div>
        </div>

        <!-- Solution Approach -->
        <div class="feature-card mb-8">
            <h3 class="text-2xl font-bold mb-6 text-gray-800">
                <i class="fas fa-route mr-2 text-blue-500"></i>解题思路
            </h3>
            <div class="space-y-4">
                <div class="flex items-start gap-4">
                    <div class="bg-blue-100 text-blue-600 rounded-full w-8 h-8 flex items-center justify-center font-bold flex-shrink-0">
                        1
                    </div>
                    <div>
                        <h4 class="font-semibold text-gray-800 mb-1">计算原始熵</h4>
                        <p class="text-gray-600">首先计算目标变量的熵，这是评估特征重要性的基准</p>
                    </div>
                </div>
                <div class="flex items-start gap-4">
                    <div class="bg-blue-100 text-blue-600 rounded-full w-8 h-8 flex items-center justify-center font-bold flex-shrink-0">
                        2
                    </div>
                    <div>
                        <h4 class="font-semibold text-gray-800 mb-1">评估每个特征</h4>
                        <p class="text-gray-600">对每个特征，计算按该特征分裂后的加权熵</p>
                    </div>
                </div>
                <div class="flex items-start gap-4">
                    <div class="bg-blue-100 text-blue-600 rounded-full w-8 h-8 flex items-center justify-center font-bold flex-shrink-0">
                        3
                    </div>
                    <div>
                        <h4 class="font-semibold text-gray-800 mb-1">计算信息增益</h4>
                        <p class="text-gray-600">信息增益 = 原始熵 - 分裂后的加权熵</p>
                    </div>
                </div>
                <div class="flex items-start gap-4">
                    <div class="bg-blue-100 text-blue-600 rounded-full w-8 h-8 flex items-center justify-center font-bold flex-shrink-0">
                        4
                    </div>
                    <div>
                        <h4 class="font-semibold text-gray-800 mb-1">选择最优特征</h4>
                        <p class="text-gray-600">选择信息增益最大的特征作为当前节点的分裂特征</p>
                    </div>
                </div>
            </div>
        </div>

        <!-- Code Implementation -->
        <div class="mb-8">
            <h3 class="text-2xl font-bold mb-6 text-gray-800">
                <i class="fas fa-code mr-2 text-green-500"></i>Java 实现代码
            </h3>
            <div class="code-container